Search CORE

Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes.

Author: Bi Yingtao
Davuluri Ramana V
Macyszyn Luke
O'Rourke Donald M
Pal Sharmistha
Showe Louise C
Publication venue: eScholarship, University of California
Publication date: 06/02/2014
Field of study

Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification

eScholarship - University of California

IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data

Author: Bi Yingtao
Davuluri Ramana V
Gupta Ravi
Kim Hyunsoo
Pal Sharmistha
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons. Results We propose a novel algorithm (IsoformEx) that employs weighted non-negative least squares estimation method to estimate the expression levels of transcript isoforms. Validations based on <it>in silico </it>simulation of mRNA-Seq and qRT-PCR experiments with real mRNA-Seq data showed that IsoformEx could accurately estimate transcript expression levels. In comparisons with published methods, the transcript expression levels estimated by IsoformEx showed higher correlation with known transcript expression levels from simulated mRNA-Seq data, and higher agreement with qRT-PCR measurements of specific transcripts for real mRNA-Seq data. Conclusions IsoformEx is a fast and accurate algorithm to estimate transcript expression levels and gene expression levels, which takes into account short exons and alternative exons with a weighting scheme. The software is available at <url>http://bioinformatics.wistar.upenn.edu/isoformex</url>.</p

Directory of Open Access Journals

Isoform level expression profiles provide better cancer signatures than gene level expression profiles

Author: Julia Tchou
Ramana V Davuluri
Sharmistha Pal
Yingtao Bi
ZhongFa Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping

Author: Ramana V Davuluri
Segun Jung
Yingtao Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data

Author: Anirban Bhattacharyya
Francisco A Perez
Priyankara Wikramasinghe
Ramana V Davuluri
Ravi Gupta
Sharmistha Pal
Publication venue: Springer Nature
Publication date: 01/01/2010
Field of study

BACKGROUND: Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. METHODS: We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. RESULTS: We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters. CONCLUSION: Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters

OCAD University Open Research Repository

Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs

Author: Agosto-Pérez Francisco J
Davuluri Ramana V
Jin Victor X
Liyanarachchi Sandya
Singer Gregory AC
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences. RESULTS: In this study, we have performed the first analysis of these core promoter elements in orthologous mouse and human promoters with experimentally-supported transcription start sites. We have identified these various elements using a combination of positional weight matrices (PWMs) and the degree of conservation of orthologous mouse and human sequences – a procedure that significantly reduces the false positive rate of motif discovery. Our analysis of 9,010 orthologous mouse-human promoter pairs revealed two combinations of three-way synergistic effects, TATA-Inr-MTE and BRE-Inr-MTE. The former has previously been putatively identified in human, but the latter represents a novel synergistic relationship. CONCLUSION: Our results demonstrate that DNA sequence conservation can greatly improve the identification of functional core promoter elements in the human genome. The data also underscores the importance of synergistic occurrence of two or more core promoter elements. Furthermore, the sequence data and results presented here can help build better computational models for predicting the transcription start sites in the promoter regions, which remains one of the most challenging problems

Directory of Open Access Journals

Genome-wide analysis of alternative promoters of human genes using a custom promoter tiling array

Author: Davuluri Ramana V
Huang Tim HM
Plass Christoph
Singer Gregory AC
Wu Jiejun
Yan Pearlly
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Independent lines of evidence suggested that a large fraction of human genes possess multiple promoters driving gene expression from distinct transcription start sites. Understanding which promoter is employed in which cellular context is required to unravel gene regulatory networks within the cell. Results We have developed a custom microarray platform that tiles roughly 35,000 alternative putative promoters from nearly 7,000 genes in the human genome. To demonstrate the utility of this array platform, we have analyzed the patterns of promoter usage in 17β-estradiol (E2)-treated and untreated MCF7 cells and show widespread usage of alternative promoters. Most intriguingly, we show that the downstream promoter in E2-sensitive multiple promoter genes tends to be very close to the 3'-terminus of the gene, suggesting exotic mechanisms of expression regulation in these genes. Conclusion The usage of alternative promoters greatly multiplies the transcriptional complexity available within the human genome. The fact that many of these promoters are incapable of driving the synthesis of a meaningful protein-encoding transcript further complicates the story.</p

Directory of Open Access Journals

Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1)

Author: Davuluri Ramana V
Lieberman Paul M
Lu Fang
Norseen Julie
Showe Louise
Tsai Kevin
Wang Pu
Wikramasinghe Priyankara
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The Epstein-Barr Virus (EBV) Nuclear Antigen 1 (EBNA1) protein is required for the establishment of EBV latent infection in proliferating B-lymphocytes. EBNA1 is a multifunctional DNA-binding protein that stimulates DNA replication at the viral origin of plasmid replication (OriP), regulates transcription of viral and cellular genes, and tethers the viral episome to the cellular chromosome. EBNA1 also provides a survival function to B-lymphocytes, potentially through its ability to alter cellular gene expression. To better understand these various functions of EBNA1, we performed a genome-wide analysis of the viral and cellular DNA sites associated with EBNA1 protein in a latently infected Burkitt lymphoma B-cell line. Chromatin-immunoprecipitation (ChIP) combined with massively parallel deep-sequencing (ChIP-Seq) was used to identify cellular sites bound by EBNA1. Sites identified by ChIP-Seq were validated by conventional real-time PCR, and ChIP-Seq provided quantitative, high-resolution detection of the known EBNA1 binding sites on the EBV genome at OriP and Qp. We identified at least one cluster of unusually high-affinity EBNA1 binding sites on chromosome 11, between the divergent FAM55 D and FAM55B genes. A consensus for all cellular EBNA1 binding sites is distinct from those derived from the known viral binding sites, suggesting that some of these sites are indirectly bound by EBNA1. EBNA1 also bound close to the transcriptional start sites of a large number of cellular genes, including HDAC3, CDC7, and MAP3K1, which we show are positively regulated by EBNA1. EBNA1 binding sites were enriched in some repetitive elements, especially LINE 1 retrotransposons, and had weak correlations with histone modifications and ORC binding. We conclude that EBNA1 can interact with a large number of cellular genes and chromosomal loci in latently infected cells, but that these sites are likely to represent a complex ensemble of direct and indirect EBNA1 binding sites